Tracking Epidemics with Natural Language Processing and Crowdsourcing

نویسندگان

  • Robert Munro
  • Lucky Gunasekara
  • Stephanie Nevins
  • Lalith Polepeddi
  • Evan Rosen
چکیده

The first indication of a new outbreak is often in unstructured data (natural language) and reported openly in traditional or social media as a new ‘flu-like’ or ‘malaria-like’ illness weeks or months before the new pathogen is eventually isolated. We present a system for tracking these early signals globally, using natural language processing and crowdsourcing. By comparison, search-log-based approaches, while innovative and inexpensive, are often a trailing signal that follow open reports in plain language. Concentrating on discovering outbreak-related reports in big open data, we show how crowdsourced workers can create near-real-time training data for adaptive active-learning models, addressing the lack of broad coverage training data for tracking epidemics. This is well-suited to an outbreak informationflow context, where sudden bursts of information about new diseases/locations need to be manually processed quickly at short notice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crowdsourcing and curation: perspectives from biology and natural language processing

Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different w...

متن کامل

Perspectives on crowdsourcing annotations for natural language processing

Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their method of motivating subjects to contribute and the scale of their applications. To date, however, there has yet to be a study that helps the practitioner to decide what form an annotation application should take to...

متن کامل

Crowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks

Human annotators are critical for creating the necessary datasets to train statistical learners, but annotation cost and limited access to qualified annotators forms a data bottleneck. In recent years, researchers have investigated overcoming this obstacle using crowdsourcing, which is the delegation of a particular task to a large group of untrained individuals rather than a select trained few...

متن کامل

Curating an Open Information Extraction Knowledge Base Using Games with a Purpose

We are interested in measuring how games with a purpose can be used as a crowdsourcing solution for transforming a (huge) set of triples extracted from Wikipedia into a useful knowledge base. We describe the natural language processing pipeline used for generating questions that we turned into games. We present three games we implemented.

متن کامل

Colors of People (Les couleurs des gens) [in French]

In Natural Language Processing and semantic analysis in particular, color information may be important in order to properly process textual information (word sense disambiguation, and indexing). More specifically, knowing which colors are generally associated to terms is a crucial information. In this paper, we explore how crowdsourcing through a game with a purpose (GWAP) can be an adequate st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012